AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal video understanding

# Multimodal video understanding

Qwen2.5 VL 32B Instruct GGUF
Apache-2.0
Qwen2.5-VL-32B-Instruct is a powerful vision-language model with enhanced mathematical and problem-solving abilities, suitable for multimodal tasks.
Image-to-Text English
Q
unsloth
464
1
Xclip Large Patch14 Kinetics 600
MIT
X-CLIP is an extended version of CLIP for general video-language understanding, trained on video-text pairs through contrastive learning.
Text-to-Video Transformers English
X
microsoft
124
5
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase